A deterministic gradient-based approach to avoid saddle points

نویسندگان

چکیده

Abstract Loss functions with a large number of saddle points are one the major obstacles for training modern machine learning (ML) models efficiently. First-order methods such as gradient descent (GD) usually choice ML models. However, these converge to certain choices initial guesses. In this paper, we propose modification recently proposed Laplacian smoothing (LSGD) [Osher et al., arXiv:1806.06317 ], called modified LSGD (mLSGD), and demonstrate its potential avoid without sacrificing convergence rate. Our analysis is based on attraction region, formed by all starting which considered numerical scheme converges point. We investigate region’s dimension both analytically numerically. For canonical class quadratic functions, show that region mLSGD $\lfloor (n-1)/2\rfloor$ , hence it significantly smaller than GD whose $n-1$ .

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent

Nesterov's accelerated gradient descent (AGD), an instance of the general family of"momentum methods", provably achieves faster convergence rate than gradient descent (GD) in the convex setting. However, whether these methods are superior to GD in the nonconvex setting remains open. This paper studies a simple variant of AGD, and shows that it escapes saddle points and finds a second-order stat...

متن کامل

Gradient Descent Can Take Exponential Time to Escape Saddle Points

Although gradient descent (GD) almost always escapes saddle points asymptotically [Lee et al., 2016], this paper shows that even with fairly natural random initialization schemes and non-pathological functions, GD can be significantly slowed down by saddle points, taking exponential time to escape. On the other hand, gradient descent with perturbations [Ge et al., 2015, Jin et al., 2017] is not...

متن کامل

First-order Methods Almost Always Avoid Saddle Points

We establish that first-order methods avoid saddle points for almost all initializations. Our results apply to a wide variety of first-order methods, including gradient descent, block coordinate descent, mirror descent and variants thereof. The connecting thread is that such algorithms can be studied from a dynamical systems perspective in which appropriate instantiations of the Stable Manifold...

متن کامل

A Generic Approach for Escaping Saddle points

A central challenge to using first-order methods for optimizing nonconvex problems is the presence of saddle points. First-order methods often get stuck at saddle points, greatly deteriorating their performance. Typically, to escape from saddles one has to use second-order methods. However, most works on second-order methods rely extensively on expensive Hessian-based computations, making them ...

متن کامل

A geometric approach to saddle points of surfaces

We outline an alternative approach to the geometric notion of a saddle point for real-valued functions of two real variables. It is argued that our treatment is more natural than the usual treatment of this topic in standard texts on calculus.

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: European Journal of Applied Mathematics

سال: 2022

ISSN: ['0956-7925', '1469-4425']

DOI: https://doi.org/10.1017/s0956792522000316